{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.interactiveshell import InteractiveShell\n",
    "InteractiveShell.ast_node_interactivity = 'all'  # default is ‘last_expr'\n",
    "\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append('/Users/siyuyang/Source/repos/GitHub_MSFT/CameraTraps')  # append this repo to PYTHONPATH"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "from collections import Counter, defaultdict\n",
    "from random import sample\n",
    "import math\n",
    "\n",
    "from tqdm import tqdm\n",
    "from unidecode import unidecode \n",
    "\n",
    "from data_management.megadb.schema import sequences_schema_check\n",
    "from data_management.annotations.add_bounding_boxes_to_megadb import *\n",
    "from data_management.megadb.converters.cct_to_megadb import make_cct_embedded, process_sequences, write_json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import Sul Ross 2018 sets\n",
    "Initial drop's class labels extracted from EXIF fields."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Give the path to a JSON file where output from this script will be written to. You can then take this file to the .Net app for ingestion to the database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "path_to_output = '/Users/siyuyang/OneDrive - Microsoft/AI4Earth/CameraTrap/Databases/megadb_2020/sulross_2018_megadb.json'  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Name of the dataset**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_name = 'sulross_2018'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 0 - Add an entry to the `datasets` table\n",
    "\n",
    "Added "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1 - Prepare the `sequence` objects to insert into the database"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1a - If you have metadata in COCO Camera Traps (CCT) format already...\n",
    "\n",
    "For a dataset, you probably have one or two JSONs in the CCT format, one containing image-level species labels and another containing bounding box annotations. Here we combine them and embed any annotation items into the image items."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# path to the CCT json, or a loaded json object\n",
    "path_to_image_cct = '/Users/siyuyang/Source/temp_data/CameraTrap/engagements/SulRoss/20190522/Database/sulross_20190530.json'  # set to None if not available\n",
    "path_to_bbox_cct = None  # set to None if not available\n",
    "assert not (path_to_image_cct is None and path_to_bbox_cct is None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading image DB...\n",
      "Number of items from the image DB: 493627\n",
      "Number of images with more than 1 species: 0 (0.0% of image DB)\n",
      "No bbox DB provided.\n"
     ]
    }
   ],
   "source": [
    "embedded = make_cct_embedded(image_db=path_to_image_cct, bbox_db=path_to_bbox_cct)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following step, properties will be moved to the highest level that is still correct, i.e. if a property at the image-level always has the smae value for all images in a sequence, it will be moved to be a sequence-level property.\n",
    "\n",
    "If a sequence-level property has the same value throughout this dataset (often 'rights holder'), it will be removed from the `sequence` objects. A message about this will be printed, and you should add that property and its (constant) value to this dataset's entry in the `datasets` table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The dataset_name is set to sulross_2018. Please make sure this is correct!\n",
      "Making a deep copy of docs...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  7%|▋         | 33442/493627 [00:00<00:01, 334339.81it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Putting 493627 images into sequences...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 493627/493627 [00:01<00:00, 324137.03it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of sequences: 172769\n",
      "Checking the location field...\n",
      "Checking which fields in a CCT image entry are sequence-level...\n",
      "\n",
      "all_img_properties\n",
      "{'id', 'location', 'datetime', 'class', 'file', 'frame_num'}\n",
      "\n",
      "img_level_properties\n",
      "{'id', 'datetime', 'class', 'file', 'frame_num'}\n",
      "\n",
      "image-level properties that really should be sequence-level\n",
      "{'location'}\n",
      "\n",
      "Finished processing sequences.\n",
      "Example sequence items:\n",
      "\n",
      "{'seq_id': 'Summer2018/S1/Summer2018__S1__2018-06-26__15-51', 'dataset': 'sulross_2018', 'images': [{'id': 'Summer2018/S1/Summer2018__S1__2018-06-26__15-51-19(1)', 'frame_num': 1, 'datetime': '2018-06-26 15:51:19', 'file': 'Summer2018/S1/Summer2018__S1__2018-06-26__15-51-19(1).JPG', 'class': ['empty']}, {'id': 'Summer2018/S1/Summer2018__S1__2018-06-26__15-51-36(2)', 'frame_num': 2, 'datetime': '2018-06-26 15:51:36', 'file': 'Summer2018/S1/Summer2018__S1__2018-06-26__15-51-36(2).JPG', 'class': ['empty']}, {'id': 'Summer2018/S1/Summer2018__S1__2018-06-26__15-51-56(3)', 'frame_num': 3, 'datetime': '2018-06-26 15:51:56', 'file': 'Summer2018/S1/Summer2018__S1__2018-06-26__15-51-56(3).JPG', 'class': ['empty']}], 'location': 'Summer2018+S1'}\n",
      "\n",
      "[{'seq_id': 'Summer2018/H6/Summer2018__H6__2018-07-06__16-07', 'dataset': 'sulross_2018', 'images': [{'id': 'Summer2018/H6/Summer2018__H6__2018-07-06__16-07-36(1)', 'frame_num': 1, 'datetime': '2018-07-06 16:07:36', 'file': 'Summer2018/H6/Summer2018__H6__2018-07-06__16-07-36(1).JPG', 'class': ['empty']}], 'location': 'Summer2018+H6'}]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "sequences = process_sequences(embedded, dataset_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# sample some sequences to make sure they are what you expect\n",
    "sample(sequences, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Some sequence frame numbers are not unique"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "problem_seqs = []\n",
    "\n",
    "for seq in sequences:\n",
    "    frame_nums = []\n",
    "    for image in seq['images']:\n",
    "        frame_nums.append(image['frame_num'])\n",
    "    if len(set(frame_nums)) != len(seq['images']):\n",
    "        problem_seqs.append(seq['seq_id'])\n",
    "\n",
    "len(problem_seqs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Summer2018/D9/Summer2018__D9__2018-10-08__17-48']"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "problem_seqs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a domestic cattle sequence that has two image both labeled with frame 1. Getting rid of this sequence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "172768"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "good_seqs = []\n",
    "for seq in sequences:\n",
    "    if seq['seq_id'] != 'Summer2018/D9/Summer2018__D9__2018-10-08__17-48':\n",
    "        good_seqs.append(seq)\n",
    "len(good_seqs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 - Pass the schema check"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Verified that the sequence items meet requirements not captured by the schema.\n",
      "Verified that the sequence items conform to the schema.\n"
     ]
    }
   ],
   "source": [
    "sequences_schema_check.sequences_schema_check(good_seqs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3 - Add any iMerit bbox annotations\n",
    "\n",
    "Only classification labels."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4 - Save the `sequence` items to a file\n",
    "\n",
    "You can now take the resulting JSON file to the .Net application for bulk insertion to the database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(path_to_output, 'w') as f:\n",
    "    json.dump(good_seqs, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can check that the bounding box annotations and paths to images all survived by running the `visualization/visualize_megadb.py` using the above exported file. \n",
    "\n",
    "Done"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:cameratraps] *",
   "language": "python",
   "name": "conda-env-cameratraps-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
